nlp_architect.data.ptb.PTBDataLoader

class nlp_architect.data.ptb.PTBDataLoader(word_dict, seq_len=100, data_dir='/Users/pizsak/data', dataset='WikiText-103', batch_size=32, skip=30, split_type='train', loop=True)[source]

Class that defines data loader

__init__(word_dict, seq_len=100, data_dir='/Users/pizsak/data', dataset='WikiText-103', batch_size=32, skip=30, split_type='train', loop=True)[source]

Initialize class :param word_dict: PTBDictionary object :param seq_len: int, sequence length of data :param data_dir: str, location of corpus data :param dataset: str, name of corpus :param batch_size: int, batch size :param skip: int, number of words to skip over while generating batches :param split_type: str, train/test/valid :param loop: boolean, whether or not to loop over data when it runs out

Methods

__init__(word_dict[, seq_len, data_dir, …]) Initialize class :param word_dict: PTBDictionary object :param seq_len: int, sequence length of data :param data_dir: str, location of corpus data :param dataset: str, name of corpus :param batch_size: int, batch size :param skip: int, number of words to skip over while generating batches :param split_type: str, train/test/valid :param loop: boolean, whether or not to loop over data when it runs out
decode_line(tokens) Decode a given line from index to word :param tokens: List of indexes
get_batch() Get one batch of the data :returns: None
load_series(path) Load all the data into an array :param path: str, location of the input data file
reset() Resets the sample count to zero, re-shuffles data :returns: None
decode_line(tokens)[source]

Decode a given line from index to word :param tokens: List of indexes

Returns:str, a sentence
get_batch()[source]

Get one batch of the data :returns: None

load_series(path)[source]

Load all the data into an array :param path: str, location of the input data file

Returns:

reset()[source]

Resets the sample count to zero, re-shuffles data :returns: None